Genomic and taxonomic evaluation of 38 Treponema prophage sequences

Background Despite Spirochetales being a ubiquitous and medically important order of bacteria infecting both humans and animals, there is extremely limited information regarding their bacteriophages. Of the genus Treponema, there is just a single reported characterised prophage. Results We applied a bioinformatic approach on 24 previously published Treponema genomes to identify and characterise putative treponemal prophages. Thirteen of the genomes did not contain any detectable prophage regions. The remaining eleven contained 38 prophage sequences, with between one and eight putative prophages in each bacterial genome. The prophage regions ranged from 12.4 to 75.1 kb, with between 27 and 171 protein coding sequences. Phylogenetic analysis revealed that 24 of the prophages formed three distinct sequence clusters, identifying putative myoviral and siphoviral morphology. ViPTree analysis demonstrated that the identified sequences were novel when compared to known double stranded DNA bacteriophage genomes. Conclusions In this study, we have started to address the knowledge gap on treponeme bacteriophages by characterising 38 prophage sequences in 24 treponeme genomes. Using bioinformatic approaches, we have been able to identify and compare the prophage-like elements with respect to other bacteriophages, their gene content, and their potential to be a functional and inducible bacteriophage, which in turn can help focus our attention on specific prophages to investigate further. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10461-5.


Background
Bacteriophages (phages) are viruses that are obligatory intracellular parasites of bacteria [1].These important bacterial predators are the most abundant biological entities on Earth with the global population of phages estimated to be around 10 31 [2,3].Despite this wellacknowledged abundance, as of August 2023, a comparatively small number, approximately 44,000 phage genomes, have been officially documented with NCBI [4], with the majority of all deposited phage sequences from representatives of the Caudoviricetes class of tailed phages [5].Phages exhibit different lifestyles, they can be lytic, swiftly killing their bacterial host cells upon replication and release, or lysogenic, integrating their genome into the host DNA, forming a prophage.Additionally, phages may adopt pseudolysogeny, often in conditions that cause suboptimal growth of the host bacteria, triggering a stage of stalled development during which neither phage genome replication nor prophage formation occurs [6,7].Chronic infection lifestyles also exist for filamentous phages, which slowly release from the host cell over an extended period without causing cell death [8].
In the lysogenic state, integrated genomes are transmitted to daughter cells through bacterial replication.Prophages can manifest in functional or nonfunctional form [9], in most cases the lysogenic cycle also allows for the exit into the lytic cycle upon induction, so called inducible phages, able to form infectious particles.Prophages may also be nonfunctional or cryptic phages, which harbour deletions, insertions and rearrangements that render them unable to complete the lytic cycle [10].
Prophages have been demonstrated to have substantial influence on their host genomes and are recognised to be key drivers of evolutionary changes in prokaryotic communities, often by enabling genome plasticity and altering host phenotypes [11].In particular, prophages can be associated with increased virulence of pathogens, through the ability to encode toxins, antibacterial resistance and alter host bacterial properties relevant to all stages of the infectious process [12].
Due to increasing bacterial resistance to antibiotics and a dearth of new antibiotics coming onto the market, there is increasing interest and research in phage therapy to combat this major threat to public health [13].Compared to temperate phages, lytic phages have been traditionally sought after as therapeutic agents, as they are lethal to bacteria akin to antibiotics and likely easier for approval as a treatment for bacterial infections [14].However, temperate phages have also been investigated for phage therapy purposes; following genetic manipulation to remove the genes essential for lysogeny [15][16][17][18], and after the discovery of spontaneous mutations, preventing lysogeny among environmental isolates [19,20].These former temperate phages have been used to successfully treat bacterial infections in vivo [15].There are also other potential options to explore, for example, using temperate phages to introduce, by lysogeny, genes conferring sensitivity to antibiotics that previously the pathogen had been resistant to [21].Another study [22] demonstrated that Clostridium difficile phages despite containing integrases, all accessed the lytic pathway and so have potential as a future treatment even though they have the ability to access the lysogenic cycle.Currently, these non-lytic examples are not preferred by regulatory bodies for application of phage therapy, however, all areas warrant investigation.
Our understanding of phage infections in spirochetes is notably limited when compared to other prokaryotes.In particular, our knowledge of phages infecting Treponema species is still in its infancy, with only a scant number of reports, mostly observations in electron microscopy images, documenting such occurrences [23][24][25][26][27][28].To our knowledge, only one Treponema prophage has been successfully induced and characterised in any detail, phage td1, from the genome of Treponema denticola [28].
The genus Treponema is of significant medical importance for both humans and animals, encompassing pathogens responsible for human and veterinary diseases such as syphilis, yaws, bejel, periodontal disease, Leporidae syphilis, and bovine digital dermatitis disease [29,30], as well as being associated with various necrotising infections, such as Noma [31].Historically, the comprehensive study of treponemes and their associated biology has faced challenges due to their fastidious nature, which makes isolation and cultivation difficult [32].However, in recent times, cultivation of treponemes has become more common place due to the ability to provide their specific conditions [33], which has made the study of treponemes and their phages more feasible.
The post genomic era offers an opportunity to characterise spirochete-infecting phages that are present as prophages in available bacterial genomes in detail.There are a substantial number of treponeme species, isolated from diverse environments, whose complete genomes have been sequenced and can be analysed for the presence of phages [34].
The objective of this study was to use a bioinformatic approach to examine 24 complete Treponema genomes available when NCBI was queried (11th December 2022), to identify and characterise treponeme prophages at the genomic level.

Identifying putative prophages in genomes of Treponema
The dataset investigated composed of 24 completed Treponema genomes representing 16 Treponemal species, accessed via GenBank.A combination of tools is required when detecting novel phage [35], therefore, PHASTER, PHASTEST and geNomad were used to identify prophage-like elements within these genomes, as well as a comprehensive manual review of each treponemal genome as per the criteria stated in the methods.PHASTER identified 49 regions, PHASTEST identified 25 regions and geNomad identified 37 prophage regions, while manual inspection identified 52 regions (Fig. 1).All the identified regions were then interrogated by CheckV, with any sequences failing CheckV verification as a putative prophage sequence removed.This pipeline resulted in 38 prophage sequences that had been identified by at least two prophage detection approaches, except for the prophage detected in T. bryantii, which was identified by manual inspection only.PHASTEST was able to identify putative att sites for seven prophages.The sequence provided for the att sites for the prophage in T. denticola differs from the predicted td1 phage attB site by Mitchell et al. [28] after they were able to induce the prophage.
Approximately half (13/24, 54%) of the treponemal genomes interrogated for the presence of prophage, did not contain any potential prophage regions, while the remaining genomes (11/24, 46%) yielded 38 putative prophage regions.The number of prophage-like sequences varied from one to eight per genome, with lengths ranging from 12.4 kb to 75.1 kb and encoding between 27 and 171 potential protein coding sequences.To provide context, the smallest known tailed phages measure approximately 11.5 kb for podoviral morphology [36], 21 kb for siphoviral morphology [37], and 30 kb for myoviral morphology [38].The prophage regions exhibited an average guanine plus cytosine (GC) content of 41.6%, closely resembling the average GC content of their respective Treponema host strains (Table 1).Treponema phagedenis B43.1 contained the most prophage DNA in its genome at 12.8% (eight prophage regions).

Genome-based phylogeny of the Treponeme infecting prophages
Multiple bioinformatic methods were then used to characterise and investigate the genomic diversity of the prophages.A phylogenetic tree of the 38 prophage regions was created by VICTOR (Fig. 2) using intergenomic distances based on protein-protein BLAST comparisons of the whole viral proteomes to infer evolutionary relationships between the predicted prophages.The genome comparison of all the prophage regions highlighted three clusters of the same genus composed of at least four prophage sequences, all of which derived from T. phagedenis strains isolated from either bovine digital dermatitis lesions or human samples and from different geographical regions.Cluster A comprises ten prophage regions, another ten prophage sequences are included in cluster B and cluster C incorporates four prophage regions with genetic similarities.A fourth less closely related cluster, but of the same genus can be seen at the top of the figure, consisting of three prophages from T. primitia and one prophage from T. azotonutricium.With the exception of ReiterP2, which appears to be from a lineage related to cluster B, the remaining identified prophage sequences appear to show very little to no genetic relationships to any of the Treponema prophage sequences identified.
The 38 prophage sequences were then analysed via VIRIDIC (Fig. 3), to provide intergenomic similarity values, which is the standard used by the International Committee on Taxonomy of Viruses (ICTV) to classify phage at the genus or species level [39].Notably, the results identified the same three T. phagedenis clusters identified by VICTOR (Fig. 2), highlighted on the righthand side heat map in Fig. 3 in blue and green.VIRIDIC has the benefit of showing the percentage similarity of the genome alignment, with some of the genomes in these clusters being as closely related as 96% similarity (range 58.2%-96.2%similarity) (Fig. 3).VIRIDIC established the less significant cluster identified in VICTOR consisting of PrimP1, PrimP2, PrimP3 and AzoP1 (Fig. 2) as having between 21.6% and 31.7%similarity and that Reit-erP2 had between 46 to 59% similarity to the prophage regions in cluster B.

Proteome-based classification of the treponeme infecting prophages
Virclust analysis provides visualisation and details of protein clustering in the different prophage sequences, as well as inferring phylogenetic relationships (Fig. 4).These results similarly identified the same three main clusters as VICTOR (Fig. 2) and VIRIDIC (Fig. 3) and can easily be seen on the heat map representation of protein clustering (cluster B = 1, clusters A and C = 2).
All 38 prophage sequences were submitted to ViP-Tree, which uses the same protein BLAST comparison method as VICTOR to determine the phylogenetic positioning against a global dsDNA viral reference database.This analysis resulted in 2837 entries in the final tree and identified all the putative Treponema prophages to be very closely clustered with one exception, VinP1 (Fig. 5).This prophage stands out among the 37 others, apparently being more closely related to Vibrio and Escherichia phages than all the other treponemal prophages identified in this study.

Table 1 Three distinct clusters of treponeme prophages and description of each prophage
The table shows the prophages in each cluster, the size, the GC content, whether any defence systems were located and what morphology is alluded to from the features of the prophages * The predicted morphology of the prophage like sequences is based on the presence of a tail sheath protein (indicating myoviral morphology) and a tail length tape measure protein (indicating siphoviral morphology) Clusters A, B and C share common lineages, featuring RuP1, VinP2, td1, and notably, the inclusion of Reit-erP2 into cluster C, highlighting its close association with 27087P1.ViPTree also grouped eight prophages which had not been identified as belonging to a cluster as a further distinct lineage.The remaining unassigned prophage, BryP1 belonged to a lineage which appears to be more closely related to Flavobacterium and Cellulophaga phages (Fig. 5).

Characterisation of the three main Treponema prophage clusters
The 24 prophage sequences which formed the three clear primary Treponema prophage clusters from T. phagedenis were further selected for in depth analysis (Table 1).A visual alignment of the prophages in each cluster was created using Clinker (Figs. 6, 7 and 8).PADLOC was used to identify any anti-viral defence mechanisms and PhageLeads and Pharokka were used to identify any virulence genes or antimicrobial resistance genes within the prophages which could be of benefit to the host bacteria.

Cluster A
There were ten putative prophages identified in cluster A, ranging from 52.5-73kb in length and encoding 71 to 102 protein coding sequences (Fig. 6).All prophages include a tail sheath encoding protein and so are likely to be of myoviral morphology.[40,41].Six prophages (27087P2, 27087P3, KS1P4, KS1P5, B43P4 and B43P5) include an integrase, a terminase and several structural conserved protein domains in the correct order (terminase -portal -protease -scaffold -major head shell (coat) proteinhead/tail joining proteins -tail shaft protein -tape measure protein -tail tip/baseplate proteins -tail fibre) and so have the potential be intact [9,42].However, CheckV results indicated only 27087P3, B43P4 and B43P5 as high quality, at 91% complete and 73kb in length, while prophages 27087P2, KS1P4 and KS1P5 are shorter (66-67kb) and were considered medium quality.Prophage T320AP2 contains an integrase but no terminase and was considered low quality by CheckV and prophages Reit-erP1, B43P7 and KS1P7 contain a terminase but no integrase and were considered medium quality by CheckV (Fig. 6).PADLOC identified only Methyltransferase proteins in B43P4, B43P5, KS1P4, KS1P5, 27087P3 and T320AP2 and no virulence or antibiotic resistance genes were detected by Pharokka or PhageLeads.
CheckV results identified all sequences in this cluster as low to medium quality (Table 1).Despite examining the wider bacterial genome on either side of these sequences, no further phage coding sequences were identified.PAD-LOC identified a Thoeris type I system in B43P2 (CDS 4 and 5, Fig. 7) and a restriction modification (R-M) type II system in B43P8 (CDS 10 and 12).Using Uniprot, the R-M system in B43P8 was found to have the largest percentage identity to a restriction endonuclease (REase) (85% identity) and methyltransferase (MTase) (88.9% identity) in Selenomonas sputigena, an anaerobic Gramnegative bacteria.

Cluster C
Four prophages were identified in cluster C (Fig. 8) with a range of 44.2-65.3kb in length and encoding 72 to 88 protein coding sequences with all sequences being considered medium quality by CheckV (Table 1).The four prophage sequences have the same length tail length tape measure protein of 4718 bases, indicating potential siphoviral morphology.T320AP1 has a short section of genome dissimilar to any other prophage in the cluster (CDS 70-84 (CDS C5078_00805 -C5078_00770 in T. phagedenis T320A bacterial genome) (Fig. 8).Only CDS 73 was identified as a likely phage protein (phage family protein) by UniProt.PADLOC identified an R-M type II system in T320AP1 (CDS 70 and CDS 72).UniProt identified the MTase to be more similar to Alysiella crassa and Prevotella corporis modification methylase EcoRI, sharing 67.8% and 60.4% identity respectively.Both are Gram negative bacteria, Alysiella being motile and aerobic and Prevotella, anaerobic and non-motile.The REase was found to be most alike to a Campylobacter hominis Fig. 3 Intergenomic similarity analysis of the 38 Treponema prophage sequences using VIRIDIC generated a heatmap incorporating intergenomic similarity values (right half ) and alignment indicators (left half and top annotation).In the right half, the more closely-related the genomes, the darker the colour and the numbers represent the similarity values for each genome pair, rounded to the first decimal.In the left half, the darker colours emphasize low values, indicating genome pairs where only a small fraction of the genome was aligned (orange to white colour gradient), or where there is a high difference in the length of the two genomes (black to white color gradient).The reward and penalty scores for matching and mismatching bases, respectively, were set to 1 and − 2, the same as the default parameters of the NCBI_BLASTN.The species and genus threshold values were set to 95% and 70% intergenomic similarities, respectively

Discussion
Despite the ubiquitous nature and medical significance of the genus Treponema [43], surprisingly little is known about its phages.In the current study we sought to develop a foundation knowledge of a subset of phages infecting Treponema through bioinformatic characterisation of prophages present in the genomes of 24 Treponema isolates of varying species from diverse environments.
Four prophage identification methods were used in this study, as well as the use of CheckV, to improve the accuracy of prophage prediction.This was further supplemented by four different programs for virus-based classification, each with differing strengths, which also provided further supportive evidence for confidence in the identification through recognising similar predicted phage clusters.Through this workflow, examination of Treponema genomes yielded 37 previously uncharacterised prophage regions (38 in total), with three clusters (named A, B and C) of closely related phages.
It is notable that the closely related phages from clusters A, B and C are all present in the same species, T. phagedenis.Three of the T. phagedenis strains examined in this study were isolated from bovine digital dermatitis lesions and are considered pathogenic, while the remaining two strains are human and considered saprophytic and nonpathogenic [34].Examined T. phagedenis genomes to date appear to have less antitoxin systems compared with other Treponema species [34], which may make T. phagedenis more susceptible to larger prophage burdens.
Based on the presence of specific tail-structure encoding genes, all the putative prophages identified are predicted to have a myoviral or siphoviral morphology.In 2022, the ICTV introduced significant updates to the phage classification system [5].As a consequence of these revisions, Treponema phage td1 [28], the sole treponema phage documented to have the excised prophage DNA In addition to previously demonstrated induction of prophages from T. phagedenis Reiter [26] and T. denticola [28], the observed genomic characteristics of the identified prophages suggest that several may have retained the functional capacity to form infectious particles.
However, it is noteworthy that examination of prophages within each cluster display considerable differences in size, indicating some may now be cryptic through deletion of prophage coding regions.Although, co-evolution with its host bacterium may mean that bacterial genes integrate into the prophage genome or that redundant genes are lost from the prophage during replication, resulting in changes in genome size of prophages from different bacterial strains [44].
It is notable that some genomes in this study encoded a substantial number of prophages.T. phagedenis B43.1 and T. phagedenis KS1 harboured the most prophage DNA with greater than 10% of their genome being of prophage origin.Other species have been noted to possess prophages constituting up to 20% of their total genome [9].Fitness benefits can be provided to hosts for harbouring prophages, including superinfection exclusion, provision of antibiotic resistance and various virulence factors [45].Whilst neither virulence nor antibiotic resistance genes were detected in any of the treponema prophages in this study via Pharokka and PhageLeads, PADLOC did detect three prophage regions containing anti-phage defence systems, providing the host with protection against further phage infection, favouring both the host and the prophage [46].Prophages T320AP1 from cluster B and B43P9 from cluster C included a R-M type II defence system and B43P2 from cluster C contained a Thoeris defence system.The Thoeris system is an example of an abortive infection system comprising of two proteins; ThsB has a toll/interleukin-1 receptor (TIR) domain, which is activated by phage infection and produces signaling molecules.This activates ThsA, which contains a domain that binds to nicotinamide adenine dinucleotide (NAD + ), causing hydrolysis, leading to depletion of the NAD + pool and cell death [47,48].
Conversely to the high prophage burdens of some strains analysed in this study, thirteen treponeme genomes were apparently completely void of any prophage-related sequences.This includes the three T. pallidum genomes, which were expected to be devoid of extraneous DNA due to their extremely limited genomes and dependency on their hosts for fulfilling their metabolic requirements [29].The lack of prophages in the remaining ten various Treponema strains could be due to several reasons.Firstly, prophages could have been present but not identified.Identifying a prophage in a bacterial genome can be difficult for many reasons including: (i) a lack of annotation of the bacterial genome (ii) only a few phage-like genes to be found in a short sequence region (iii) only a remnant may be left of a once functional prophage, or (iv) prophages may be undetectable within a bacterial genome that is considered fully annotated but incorrectly so [49].Another explanation could be that by chance, individuals with no phage genomes could have been chosen for sequencing [9].A third explanation is that no prophages are present in those bacterial genomes, as a common finding seems to be that only around 50% of bacterial species analysed have been found to be lysogens [50,51].
When seeking to identify potential hypotheses to account for the absence of prophages in thirteen of the Treponema genomes, no apparent patterns were identified, as have been seen in previous studies [50,51], such as minimum doubling time of the host, genome size, CRISPR-Cas systems or pathogenicity.The T. pallidum genomes are small, at 1.1 MB in length and have no CRISPR-Cas systems, however, they are pathogens.The remaining ten Treponema genomes without prophages are of a similar size to the lysogens, and all contain CRISPR-Cas systems bar T. vincentii, suggesting divergence in these correlations across bacterial taxa.However, several of the strains that appeared devoid of prophages here were single representatives of their species and therefore prophages within the wider species cannot be ruled out.
There were several limitations of this study, including only being able to investigate a subset of Treponema genomes and the limitation of using prophage identification software that has been developed or trained on known phages.Prophage integrase genes are always adjacent or very near the attachment site on the phage chromosome, so can typically mark one end of the integrated prophage [9,52].However, it can be difficult to distinguish the actual end of the prophage and start of the bacterial genome.Here we double checked the geNomad results manually to estimate the beginning and end of each prophage region as accurately as possible, as phage genomes show distinct gene clustering according to general function [9].

Conclusions
In this study, we describe 38 prophage-like sequences present in 24 Treponema genomes substantially increasing the foundation knowledge of phages infecting Treponemal species.The majority of the 38 prophage regions appear to be distinct from any other described bacteriophages to date and have presented strong evidence for the presence of prophages with high diversity as well as three distinct prophage region clusters within T. phagedenis strains, as confirmed by four independent analyses.This data will aid in future characterisation of potential treponemal prophages in existing and future genome and metagenomic datasets.The data also demonstrates compelling evidence for the presence of several potentially functional prophages and that further research could identify prophages which have the potential to be therapeutic agents against a medically important genus for both humans animals.

Detection of prophages in Treponema species
Representative Treponema species with complete genome sequences and valid GenBank accession numbers that could be obtained from the RefSeq database (https:// www.ncbi.nlm.nih.gov/ refseq/.(accessed on 11 December 2022)) were analysed, which led to a total of twenty-four complete Treponema genome sequences.These were screened for the presence of prophages using PHASTER (PHAge Search Tool Enhanced Release) [53], PHASTEST (PHAge Search Tool with Enhanced Sequence Translation) [54] and geNomad v1.7.4 [55], using end to end modules and default options.Each bacterial genome was also manually inspected using Artemis v18.2.0 [56], a genome browser that allows visualization of sequence features.Each genome was surveyed for areas that could be identified as potential prophage regions, based on the following criteria: (i) identifying reasonably conserved phage proteins already annotated, such as integrases, portal proteins, terminases, tail tape measure proteins [10], (ii) consecutive hypothetical proteins, (iii) putatively co-transcribed and contiguous open reading frames (iv) encoded within the same DNA strand [52].The beginning and end of the prophage sequences were estimated by geNomad as well as by manual estimation using (i) the presence of integrases [52], (ii) recognizing when genes started to be annotated again and were likely bacterial in origin and (iii) observing when the genes started to cross the DNA strands again.The identified possible prophage like sequences were subsequently saved and CheckV [57] was used to assess the quality of the viral genomes.Any sequences with no viral genes detected were removed from the study.
The determination of phage morphology relied on the presence of specific structural proteins.The presence of a tail sheath protein indicated prophages with myoviral morphology (contractile-tailed phages) [40,41].Conversely, the presence of a tail tape measure protein without a tail sheath protein indicated siphoviral morphology [40,41].

Fig. 1
Fig. 1 Bar chart to show the number of prophage regions estimated by each detection method; PHASTER, PHASTEST, geNomad and manual inspection

Fig. 2
Fig. 2 Phylogenetic tree generated by VICTOR using the predicted genome sequences of the 38 different prophage regions.Three clusters of prophages were identified with genetic similarities (A, B and C).The colours of the key indicate which prophages are predicted by VICTOR to be of the same family, genus, or species, as well as the GC content and genome size.Treponemal species are designated at the start of the phage name: T. azotonutricium -Azo, T. primitia -Prim, T. ruminus -Ru, T. bryantii -Bry, T. denticola -td, T. vincentii-Vin, T. phagedenis -KS1, B43, 27,087, Reiter, T320A

Fig. 6
Fig. 6 Comparative genome alignment of prophages comprising cluster A. Phage genomes are presented alongside their designated name and genome length.Coding sequences are represented by arrows coloured to reflect homologous groups identified by Clinker and are linked by grey bars shaded to represent the percentage amino acid identity, as indicated in the legend

Fig. 7
Fig. 7 Comparative genome alignment of prophages comprising cluster B. Phage genomes are presented alongside their designated name and genome length.Coding sequences are represented by arrows, coloured to reflect homologous groups identified by Clinker, and are linked by grey bars shaded to represent the percentage amino acid identity, as indicated in the legend

Fig. 8
Fig. 8 Comparative genome alignment of prophage comprising cluster C. Phage genomes are presented alongside their designated name and genome length.Coding sequences are represented by arrows, coloured to reflect homologous groups identified by Clinker, and are linked by grey bars shaded to represent the percentage amino acid identity, as indicated in the legend